Benchmarking AlphaFold3’s protein-protein complex accuracy and machine learning prediction reliability for binding free energy changes upon mutation

AlphaFold 3 (AF3), the latest version of protein structure prediction software, goes beyond its predecessors by predicting protein-protein complexes. It could revolutionize drug discovery and protein engineering, marking a major step towards comprehensive, automated protein structure prediction. However, independent validation of AF3’s predictions is necessary. Evaluated using the SKEMPI 2.0 database which involves 317 protein-protein complexes and 8338 mutations, AF3 complex structures give rise to a very good Pearson correlation coefficient of 0.86 for predicting protein-protein binding free energy changes upon mutation, slightly less than the 0.88 achieved earlier with the Protein Data Bank (PDB) structures. Nonetheless, AF3 complex structures led to a 8.6% increase in the prediction RMSE compared to original PDB complex structures. Additionally, some of AF3’s complex structures have large errors, which were not captured in its ipTM performance metric. Finally, it is found that AF3’s complex structures are not reliable for intrinsically flexible regions or domains.


Introduction
AlphaFold3 (AF3) is the latest iteration of the groundbreaking protein structure prediction software developed by Google DeepMind and Isomorphic Labs [1].Building on the success of its predecessors, AF3 brings significant advancements to the field of computational biology.One of the key features of AF3 is its ability to predict protein-molecule complexes, including those involving DNA and RNA.This represents a major leap from AF2, which primarily focused on predicting the three-dimensional (3D) structures of proteins.The remarkable success of AF3 has opened up new possibilities for extending its purpose beyond fundamental structural prediction tasks to include protein-molecule complexes.
The significance of accurately predicting 3D protein structures using AlphaFold cannot be overemphasized.Fundamentally, 3D protein structures are instrumental in revealing protein functions, comprehending various biological interactions, and designing drugs.Numerous prevalent diseases, including Alzheimer's and Parkinson's, are linked to aberrant protein structures.Consequently, the automated prediction of protein folds from sequences has emerged as a crucial challenge in biology and is often referred to as the holy grail of molecular biophysics.In 2019, AlphaFold's [2] leading performance in predicting 25 structures out of 43 test proteins sparked enthusiasm among researchers about the potential future of AI-based protein structure prediction.For instance, DeepFragLib by Wang et al. [3] signifies a novel progression in ab initio protein structure prediction.Thereafter, the launch of AlphaFold ignited a transformative shift in the way we model protein structures and their interactions [2].AlphaFold also opened up a vast array of possibilities in protein folding, protein engineering, and design [4,5,6,7,8,9,10].
Data-driven machine learning models have demonstrated great power by utilizing 3D protein-protein complexes.Protein-protein interactions (PPIs) also play a significant role in nearly all cellular and biological activities.In the study of PPIs, mutation-induced effects play a paramount role in evolutionary biology, cancer biology, immunology, directed evolution, and protein engineering.Data-driven machine learning models have targeted the study of mutation-induced effects on protein stability and PPI bind-ing affinities.Computational approaches have primarily served as a viable alternative to experimental mutagenesis methods.Previously, scientific communities have naturally extended the capabilities of Al-phaFold by expanding the protein structural database [11].With the accessibility of AF3 through the AlphaFold Server, AF3 can potentially become a valuable tool for advancing deep learning models towards applications of PPIs.
Nonetheless, it was found that the prediction accuracy of protein engineering, which iteratively optimizes protein fitness by screening the gigantic mutational space, drops when AlphaFold2 structures were used [5].Therefore, it is imperative to independently validate AF3's accuracy and reliability for PPI analysis and mutation-induced protein-protein binding free energy (BFE) change predictions.
One of the very successful approaches for mutation-induced protein-protein BFE change predictions is the topology-based network tree (TopNetTree) [12], which is based on topological deep learning (TDL), introduced in 2017 [13].TDL is an emerging paradigm in machine learning, based on topological data analysis (TDA) [14,15].TDA, a branch of mathematics, focuses on understanding the shape and structure of data and is exceptionally successful in improving standard approaches by contributing novel topological characterization.TDL captures protein structures, simplifies the structural complexity of biomolecules, and embeds physical interactions into topological invariants.
TDA has had tremendous success through TopNetTree [12], PerSpect-EL [16], HCML [17], and TopNetmAb [18] by incorporating topological fingerprints from persistent homology (PH) and persistent Laplacian to predict PPI BFE changes upon mutation and subsequently, BFE changes due to mutations in the SARS-CoV-2 receptor binding domain (RBD) -Angiotensin-converting enzyme 2 (ACE2) complexes.In particular, persistent Laplacian provides both the topological information in PH and the homotopic shape of evolution [19].Using persistent Laplacian, Omicron BA.4 and BA.5 were predicted to be new dominant variants two months before the World Health Organization (WHO) made the announcement [18].Persistent Laplacian is also incorporated in the recent integration between TDL and pre-trained ESM transformer features for predicting mutation-induced protein solubility changes, which establishes the state-of-the-art method for protein-protein BFE change predictions [20].
In this work, we evaluate AF3's complex structure accuracy and mutation-induced BFE change prediction reliability using the largest PPI database, SKEMPI 2.0, which involves 317 protein-protein complexes and 8,330 mutation-induced BFE changes [21,12].First, we consider AF3's PPI structure accuracy.Additionally, we examine AF3's reliability in predicting protein-protein BFE changes upon mutation using the most advanced TDL method [20].Our results demonstrate that AF3-predicted complexes achieved a relatively good Pearson correlation coefficient of 0.86, which is slightly less than the 0.88 reported earlier [20].However, we found that AF3 results in 8.6% increase of root-mean-square error (RMSE) compared to the original complexes in the Protein Data Bank (PDB) for BFE change prediction.Through our analysis, we also discovered that some structurally misaligned AF3 protein-protein complexes are not captured by AF3's ipTM performance metric.Finally, AF3 complex predictions may not be reliable for highly flexible protein domains.

Results
In this section, we assess the accuracy and reliability of AF3 predictions of PPI complexes.Here, we use the accessible AlphaFold Server of AF3 to predict the protein-protein complexes from SKEMPI 2.0 database [21].This database contains the S8338 dataset for mutation-induced BFE changes, which is one of the most comprehensive datasets collected on how mutations can alter the binding affinity in PPIs, with over 8,338 entries of single mutations.However, SKEMPI 2.0 database involves certain PPI complexes, namely 3NVN and 4U6H, with restricted viral pathogenic sequences in AlphaFold Server (accessed 11th May 2024).These complexes only make up 8 samples in the S8338 dataset.Therefore, we incorporate the remaining 8330 single mutation-based samples generated from 317 PPI complexes.
The 317 predicted AF3 complexes are utilized to predict the 8330 mutation-induced binding free energy changes.More details on the SKEMPI 2.0 database can be found in the Supplementary Information.

Validation Performance
In the validation test, we conduct a 10-fold cross-validation on MT-TopLap AF3 by predicting the mutationinduced BFE changes using the features extracted from the remaining 317 AF3 structures.Existing topology-based models like TopLapNetGBT and TopNetTree, which have proven to be successful, have undergone training and validation using the S8338 dataset [18,12].
Table 1 shows the 10-fold cross validation results of MT-TopLap AF3 models for the BFE change prediction for mutations on the S8338 dataset.The performance of MT-TopLap AF3 model is evaluated based on the Pearson correlation coefficient (R p ) and RMSE.Their definitions are included in the Supplementary Information.MT-TopLap AF3 achieved an R p of 0.86 and RMSE of 1.025 kcal/mol.In comparison to non-topology-based models such as mCSM-PPI2 [22], MT-TopLap AF3 displayed an 18% improvement in its RMSE.

Structural Alignment Performance
Using with the original PDB structures for the S8338 dataset, the original MT-TopLap achieved an R p of 0.88 and RMSE of 0.937 kcal/mol.This implies an increase in 8.6% of RMSE due to the AF3 predicted Multimer, is a metric that evaluates the accuracy of the predicted interface in a protein-protein complex [23].For example, an ipTM score above 0.8 indicates that the PPI complex generated is highly confident prediction.On the other hand, an ipTM score below 0.6 suggests that the predicted structure is likely to be incorrect while any ipTM score between 0.6 and 0.8 includes prediction that are correct or wrong [23].Another AF3 scoring is pTM which is a comprehensive measure of the accuracy of prediction of the entire structure of the complex.It represents the predicted TM score for a superposition between the predicted structure and the assumed true structure.However, a good pTM score above 0.5 may be due to a correct prediction of the larger protein dominated in the protein-protein complex although the other partner protein has a poor prediction.
Figure 1 shows the statistics of 317 AF3 structures from S8338 dataset.The average RMSD, average ipTM and average pTM calculated are 1.61 Å, 0.803 and 0.847 respectively.In Figure 1A, complexes that are considered outliers have a poor RMSD greater than 4 Å.Despite this, 71.6% of the complexes have an high ipTM score of at least 0.8, and 98.7% of the complexes have a pTM score of at least 0.5.
Essentially, this shows that most of the AF3 predicted complexes have high ipTM but also have low RMSD.The four complexes (i.e.1DVF, 3LB6, 3LZF and 5TAR) with pTM below 0.5 also have ipTM scores below 0.6.

Discussion
To analyze which AF3 PPI complex is not well predicted, we first rank the 317 AF3 complexes with their RMSD and ipTM scores.Another potential limitation of AF3 is illustrated in Figure 3A and B. Figure 3A shows the PPI complex of PDB ID: 4YH7 colored by B-factor values.High B-factors or high flexible protein regions are colored in red while low B-factors or more rigid protein regions are colored in blue.Figure 3B shows the structural alignment of PDB ID: 4YH7 (in blue) with its AF3 predicted complex (in orange).It can be observed that the regions in Figure 3A with high B-factors correspond to the region in Figure 3B where the residues are misaligned between AF3 and PDDB structures.Here, the RMSD per residue is calculated based on the backbone atoms of the protein.To investigate this further, we observe a similar pattern when all residue-based B-factors are plotted against residue-based RMSD for all 317 PPI complexes in Figure 3C.The colors in Figure 3C represent the 20 canonical amino acid types.In general, Figure 3C shows a huge amount of residues with high B-factors and high RMSD.Based on these analyses, it is concluded that AF3 complex prediction is not reliable for intrinsically flexible regions or domains of PPI complexes.

Conclusion
We examine the use of AlphaFold3 (AF3) predicted complex structures for the prediction of proteinprotein binding free energy (BFE) changes upon mutation for the S8338 dataset in the SKEMPI 2.0 database.We noted that AF3 complex structures lead to a relatively good predicted Pearson correlation coefficient of 0.86 but have generated 8.6% more in RMSE compared to using PDB complex structures.
Essentially, AF3 is a promising technology for predicting protein-protein complexes, even capable of contributing to mutation-induced BFE change predictions.
In total, 317 AF3 complexes inferred by the SKEMPI 2.0 database were structurally aligned with original ones in the PDB to investigate AF3 predictions that have poor RMSD in structural alignment.
Through our analysis, we discovered that current AF3 performance metrics such as ipTM does not correlate with RMSD alignment scores.On the other hand, preliminary findings show that high RMSD are strongly correlated to high B-factors, indicating AF3 predictions are not reliable for highly flexible protein regions or domains.

Feature generation for MT-TopLap
Among all features used in MT-TopLap prediction, persistent Laplacians (PL) plays the most significant role in validating the reliability of AF3 structures.With accurate 3D SARS-CoV-2 receptor binding domain (RBD) -Angiotensin-converting enzyme 2 (ACE2) complexes, PLs correctly predicted the dom-inance of the Omicron BA.4 and BA.5 two months preceding the public announcement by World Health Organization (WHO) in June 2022 [18].
In this section, we provide the important main mathematical framework instrumental for understanding element-specific PL descriptors applied in our feature generation.Element-specific topological approaches were introduced in earlier work [13,24].In the simplicial complex and PL framework [19], we emphasize their importance in identifying both harmonic and non-harmonic spectral characteristics that are crucial for understanding PPIs.
To generate the PL features for our MT-TopLap models, we categorize the atoms in PPIs into several subsets.These subsets are • Atoms in mutation sites A m , • Atoms in mutation neighborhood, i.e.Atoms within a distance r from the mutation site A mn (r), • Protein 1's atoms within distance r of the binding site A P 1 (r), • Protein 2's atoms within distance r of the binding site A P 2 (r).
Additionally, we also group atoms into different element specific categories such as {C, N, O}, denoted as A ele .These element-specific groups are crucial in identifying the various types of interactions in a PPI model, as per biophysical principles.For instance, the subsets A C ∩ A P 1 (r) and A C ∩ A P 2 (r) capture hydrophobic C-C PPIs while A N ∩ A P 1 (r) and A O ∩ A P 2 (r) capture hydrophillic N-O PPIs.
In addition to groups specific to elements, we also utilize distance functions like D mod , which exclude interactions between atoms from the same subset.For interactions between atoms A i and A j in sets P 1 and P 2 , D mod is defined as follows: Here, DE(•, •) refers to the Euclidean distance between two atoms.
By utilizing the groups specific to elements/sites, atoms are arranged into point clouds, which are then used to construct simplicial complexes.Specifically, a set of k + 1 atoms from an element/site-specific subset forms k + 1 independent points, which can be represented as a set The convex hull of k + 1 affinely independent points essentially forms a k-simplex.In simple terms, a point is a 0-simplex, an edge is a 1-simplex, a triangle is a 2-simplex, a tetrahedron is a 3-simplex, and so on for higher dimensions, forming a k-simplex.A simplicial complex is created by combining these finite simplices [25,26,27,28].There are numerous methods to construct a simplicial complex.For generating our persistent Laplacian-based features, we employed the Vietoris Rips complex for dimension 0 and the Alpha complex for dimensions 1 and 2. A Vietoris-Rips (VR) complex is an abstract simplicial complex that forms simplices by connecting any subset of points with a diameter not exceeding a certain threshold.
On the other hand, an Alpha complex is a group of subcomplexes derived from a Delaunay triangulation, subject to a radius constraint that does not exceed a certain threshold.The Delaunay triangulation is a geometric structure that divides the convex hull of a set of points in a plane into triangles.
For a simplicial complex K, a k-th chain c k is the formal sum of k-simplicies in K, i.e.
Here, the condition that the boundary of a boundary is empty must be satisfied.Defining the adjoint op- From here, the k-topological Laplacian, a linear operator In understood as a zero map.
In our model, the key topological features are the eigenvalues of combinatorial Laplacian matrices.
These eigenvalues are not dependent on the orientation choice [29].Moreover, the total number of zero eigenvalues, or their multiplicity, in L k equates to the kth Betti number, β k , as per the combinatorial Hodge theorem [30].These Betti numbers are topological invariants that represent the k-dimensional holes in a k-simplicial complex.For instance, β 0 , β 1 , and β 2 denote the number of independent components, loops, and cavities, respectively.Overall, the zero and non-zero eigenvalues embody the harmonic and non-harmonic spectra of combinatorial Laplacian matrices.The non-harmonic spectra offer additional homotopic shape information that Betti numbers lack.
A single simplicial complex is not enough to capture all the topological information from a single protein-protein complex.By combining combinatorial Laplacian and multiscale filtration, we monitor the variations of harmonic and non-harmonic spectra by adjusting a filtration parameter such as radii/diameter for VR complex [19].For an oriented simplicial complex K, a filtration creates a nested sequence of simplicial complexes (K t ) m t=0 of K, As the value of the filtration parameter increases, PL generates a sequence of simplicial complexes.
Based on this nested sequence of simplicial complexes, we can produce a sequence of combinatorial . By changing the filtration parameter and performing diagonalization on the k-combinatorial Laplacian matrix, the characteristics of topology and spectrum can be examined from each L k (K t ) (0 ≤ t ≤ n).The eigenvalues of L k (K t ) can be sorted in ascending order Spectra(L t k ) = {(λ 1 ) t k , (λ 2 ) t k , • • • , (λ n ) t k } where L t k is an n × n matrix.The p-persistent k-combinatorial Laplacian can also be extended based on the boundary operator.Ultimately, these PL descriptors help MT-TopLap track the changes of harmonic and non-harmonic spectra in 3D protein-protein complexes, capturing both the intrinsic topological changes and homotopic shape evolution throughout its filtration process.Figure S2 depicts the harmonic and non-harmonic spectra generated from a point cloud using Alpha complex and D mod -based filtration.Further details of PL's mathematical framework is provided in Supplementary Information.

Figure 1 :Figure 2 :
Figure 1: A: The cartoon representation of ribonuclease inhibitor-angiogenin complex (PDB ID: 1A4Y).The ribonuclease inhibitor is shown in blue and the angiogenin is in green.The mutation spots of 1A4Y in the S8338 dataset are indicated in red.B: The structural alignment of 1A4Y with its AF3 predicted complex.C: The boxplot for RMSD, ipTM and pTM distributions of 317 predicted AF3 protein-protein complexes.RMSDs refer to the overall RMSD calculated by structurally aligning an AF3 complex with its original PDB complex.D: The breakdown of AF3 protein-protein complexes based on their ipTM and pTM scoring criteria.
Figures 2A and D. Therefore, there is little correlation between ipTM score and prediction RMSD.In general, RMSD also influences the final absolute prediction errors in Figure 2C.Future improvements to AF3 should incorporate RMSD as part of its prediction performance.A C

Figure 3 :
Figure 3: A: The B-factor representation of PDB ID: 4YH7.Atoms colored in red have high B-factors while atoms colored in blue have lower B-factors.B: Illustration of structural alignment for PDB ID: 4YH7 (in blue) and its AF3 predicted complex (in orange).C: The residue-based B-factor values plotted against all the residue RMSD calculated from all 317 AF3 predicted complexes.A log-scale is taken on the y-axis.
matrix representations, we denote B k as an m × n matrix of the boundary operators under the standard bases {σ k i } n i=1 and {σ k−1 j } m j=1 of C k and C k−1 .In a similar way, the transpose boundary matrix B ⊤ k is used to denote the matrix representation of ∂ * k with respect to the same ordered bases of the boundary operator ∂ k .This naturally leads to the k-combinatorial Laplacian matrix, which is an